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Abstract 

One of the most important steps in biomedical longitudinal studies is choosing a good experimental design that can 
provide high accuracy in the analysis of results with a minimum sample size. Several methods for constructing efficient 
longitudinal designs have been developed based on power analysis and the statistical model used for analyzing the final 
results. However, development of this technology is not available to practitioners through user-friendly software. In this 
paper we introduce LADES (Longitudinal Analysis and Design of Experiments Software) as an alternative and easy-to-use 
tool for conducting longitudinal analysis and constructing efficient longitudinal designs. LADES incorporates methods for 
creating cost-efficient longitudinal designs, unequal longitudinal designs, and simple longitudinal designs. In addition, 
LADES includes different methods for analyzing longitudinal data such as linear mixed models, generalized estimating 
equations, among others. A study of European eels is reanalyzed in order to show LADES capabilities. Three treatments 
contained in three aquariums with five eels each were analyzed. Data were collected from 0 up to the 12th week post 
treatment for all the eels (complete design). The response under evaluation is sperm volume. A linear mixed model was 
fitted to the results using LADES. The complete design had a power of 88.7% using 1 5 eels. With LADES we propose the use 
of an unequal design with only 14 eels and 89.5% efficiency. LADES was developed as a powerful and simple tool to 
promote the use of statistical methods for analyzing and creating longitudinal experiments in biomedical research. 
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Introduction 

Most biomedical research projects are planned according to the 
number of measures to obtain a desired level of accuracy to test 
the hypothesis of interest. The number of laboratory animals 
required depends on different factors such as the available budget 
to obtain the animals; costs regarding housing, adapting time, and 
hiring qualified personnel; research procedure time, and cost. 
However, animal ethics represent the most important 'cost' of 
developing a biomedical study. Thus, one of the most common 
and critical challenges in biomedical studies involving animal 
models is the minimum number of individuals needed to achieve a 
certain statistical power or validity. The National Center for the 
Replacement, Refinement and Reduction of Animals in Research 
(NC3Rs) promotes the use of experimental design and statistical 
analysis as important tools to use as few animals as possible 
without affecting the efficiency of the study. Therefore, biomedical 
researchers are always looking for cost-efficient designs, i.e. designs 
with lower cost and more accuracy as possible in their results. A 
comprehensive guideline for the design of experiments with 
laboratory animals is presented in [1]. Moreover, the method used 
to evaluate efficiency or power factor analysis or other types of 
experimental designs is described in [2]. The most common 
approaches for calculating sample size in animal experiments are 
provided in [3]. However, these methods work only with a 



univariate approach; i.e. one observation per subject. Such a 
univariate approach may be useful in some cases; however, most 
biomedical studies are focused on evaluating the development, for 
example, of a new drug over time or the effects resulting during a 
time period of a new treatment. Such studies are special, as the 
result of the last time-response of an individual influences the next; 
i.e., within-subject observations are correlated. Longitudinal 
analysis is very useful when we want a statistical model that 
includes the correlation between within-subject measures. Popular 
techniques for analyzing longitudinal data are mixed linear models 
as noted in [4] . Another option is generalized linear mixed models 
[5], [6]; also, generalized estimating equations [7], [8], and 
generalized least squares as described in [9], among others. With 
regard to the problem of sample size, the approach to assess the 
number of animals needed for a longitudinal study vary according 
to the statistical model and methods to be used. For generalized 
estimating equations, the formulas provided in [10] are very 
useful. Furthermore, for the mixed linear model under continuous 
responses, the F-Helms statistic provided in [1 1] is a direct way to 
calculate statistical power in longitudinal designs. Under this 
framework Helms proposed a methodology for creating efficient 
incomplete designs. This type of designs are very useful in animal 
experiments, since we can calculate the required sample size under 
different limitations of the cost of maintaining animals throughout 
the experiment. Despite these important contributions, the 



PLOS ONE | www.plosone.org 



1 



July 2014 | Volume 9 | Issue 7 | e100570 



LADES for Constructing and Analyzing Longitudinal Designs 



analysis of power for longitudinal experiments is still open to study 
due to the value of research in more complex statistical models 
that can be applied to biomedical studies [12], [13], [14], [15]. 
Advances have been made in relation to the power of statistical 
computing and the optimal design of the experiments. For 
example, Reich and colleagues provide a data generating model 
in a freely available software package (clusterPower) for R where 
the estimation of statistical power is calculated as a factor in the 
effect of randomized or cross-group test designs [16]. New 
biostatistical methods for calculating power and sample sizes for 
data analysis of a mocrobiome are presented by [17] in the 
software package HMP: Hypothesis testing and power calculations 
for comparing metagenomic samples of HMP. However, the use 
of this code based R package is difficult for biomedical 
professionals who have no experience with programming com- 
mands. Moreover, [18] proposed the construction of an optimal 
design of experiments to reduce the time and cost of modeling 
biochemical reaction networks using coupled differential equa- 
tions; a task that can be done using ModelDiscriminationToolk- 
itGUI. There are other softwares for sample size calculation in 
longitudinal models. The nQuery [19] includes sample size 
computations for repeated measures analysis under continuous 
and binary responses; however, we cannot design longitudinal 
experiments for more than two groups and it is not free. The PASS 
12 commercial software [20] also includes sample size estimation 
for repeated measures analysis but you can include up to three 
groups. Nevertheless, it does not include statistical methods to 
analyze the results once we have collected the data. The Optimal 
Design Software for Multilevel and Longitudinal Research [2 1] is 
a free software that can produce graphical analysis for sample size 
estimation for a wide variety of social research studies performed 
in classrooms, schools, communities, clinics, etc. Nonetheless, since 
this software is only focused on applications in the field of social 
research, the implementation of its power analysis tools through- 
out its user interface to studies in other fields such as biomedical or 
engineering it is not direct and easy. 

As a result, we believe that good software must have certain 
conditions: contain more useful methods for sample size and 
analysis of data, a nice user interface, flexibility for being used in 
other kind of studies (e.g. industrial or psychological), and 
accessibility for most researchers and biomedical professionals. 
The purpose of this paper is to present LADES (Longitudinal 
Analysis and Design of Experiments Software) for longitudinal 
data analysis, and construction of efficient longitudinal designs. 
LADES provides tools for the construction and analysis of 
longitudinal experiments using a user-friendly interface. LADES 
is a free and good alternative for biomedical and medical 
researchers who are not familiar with specialized statistical 
methods. In addition, it has been created on the basis that there 
are not many computer programs whose scope is the creation of 
longitudinal designs with necessary and appropriate cost efficiency. 
LADES capacity to build cost-efficient designs is demonstrated by 
analyzing a study of the European eel in which the response of 
interest was generated sperm volume. 

Materials and Methods 

Introduction to LADES 

LADES includes statistical methods for designing biomedical 
studies and analyzing their results. Moreover, one of its main 
objectives is to provide an easy to use interface for biomedical 
researchers and professionals who have only a basic knowledge of 
statistical methods. LADES includes a module that allows us to 
calculate sample size for specific cases of problems of longitudinal 



data with continuous and/or binary responses. A module also 
exists for creating a variety of experimental designs and optimal 
designs based on [22] . LADES provides statistical methods such as 
generalized estimating equations for modeling covariates like Time 
and correlation between outcomes; linear mixed models for 
including fixed and random effects in the model; generalized linear 
mixed models for analyzing discrete outcomes; and generalized 
least squares to cope with unequal variances of the observations. 

Moreover, LADES has the ability to evaluate and create 
optimal designs in cost and power, using power analysis and F- 
Helms statistic. With regard to the importance of these issues, 
LADES is an excellent alternative for the analysis and evaluation 
of the longitudinal design of experiments for biomedical studies. 

LADES was built using JAVA programming language and all 
functions are built into R [23]. LADES uses numerous R packages 
including nlme [24], ggplot2 [25], AlgDesign [26], FrF2 [27], and 
geepack [28]. LADES is available in its project home page: http:// 
cimat.mx/~ hectorhdez/lades/index.html. It runs on Windows 7 
and Windows 8 operating systems. 

In the following sections some of the capabilities of LADES 
using real data will be presented. All information about the 
installation of the software, how to run it and how to perform the 
calculations showed in the following sections can be found in the 
Manual available in the More Information section in LADES' 
home page. 

Problem 

The information provided is from a study by [29] where the 
amount of sperm produced by European eels was evaluated. The 
main purpose of this software is to build a longitudinal design 
optimizing the number of eels, and providing a reasonable power 
to test the difference between the slopes of three treatments. 
Biomedical properties are described as follows. Three aquariums 
containing five eels each were filled with three different treatments 
(A, B, and C). Thirteen measurements were made over time; one 
on the day treatment was started and during the remaining twelve 
weeks post treatment. Therefore, the experimental design was 
completely longitudinal with 15 eels and 13 measures over time. 
The response in this study was the sperm volume (ml 100 g fish -1) 
from each eel. Now, the entire procedure to obtain an efficient 
longitudinal design for comparing the slopes of the three 
treatments is shown. 

Plotting. In order to obtain an overview of the resulting 
sperm volume profiles for each eel under study, the Longitudinal 
Data Graph function in LADES was used. Figure 1 depicts the 
sperm volume profiles for European eels grouped by Treatment. 
Differences between the starting points of each eel can be seen in 
the graph; therefore, a linear mixed model with a random 
intercept to model these discrepancies was used. The random 
intercept helps us to model variations due to differences (e.g., 
genetics), among eels. Moreover, Figure 1 provides insights about 
differences among the average slopes of the three groups. 

Fitting the Statistical Model. The proposed data sperm 
production of European eel statistical model is: 

{(A) + hi) + p\ tjj + Cjj, Aquarium A 
(fi 2 + + fi 3 tjj + ey, Aquarium B ( 1 ) 

(P4 + bd + jS 5 ty + ey, Aquarium C 

where yy represents the y'-ith measurement of the i'-th eel sperm 
production. The parameters /? 0 , /8 2 , and /? 4 represent the average 
response at the time of treatment, whereas /?[, /? 3 , and /? 5 
represent the average slopes for the Aquarium A, Aquarium B 
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Figure 1. Sperm volume profiles for European Eels grouped by Treatment. CHIP variable was used to Indicate each eel. Eel 098C did not 

present sperm volume in the last two weeks. 

doi:10.1371/journal.pone.0100570.g001 



and Aquarium C, respectively. 6,~/V(0, tr/, 0 ) is the random factor 
for representing the difference between the initial volume of sperm 
of the eels and the overall mean (as indicated in the previous 
section); and £y~N(0, a e ) the model residuals. Moreover, bt and 
£jj are independent of each other. We use this model because it 
leads us to more direct comparison of the slopes of the three 
treatments. 

We fitted the linear mixed model 1 using Restricted Maximum 
Likelihood [30], which is the method for estimating the 
parameters of interest. The results of the Linear Mixed Model 
function in LADES for the estimated parameters were 

ft = 132.76, ft = -2.01, ft = 171.57, 
ft = - 1 .90, ft = 142.55, ft = -0.876, 



a b =27.00 and a e = 7.59. 



(2) 



Evaluating Power and Cost. We want to compare the 
evolution over time of sperm volume of the three aquariums. 
Therefore, the hypothesis was based on the mean of the slopes of 
the three treatments. This hypothesis to test is shown below: 



ff 0 :ft=ft=ft 



(3) 



using an alpha of 0.05. In order to test the hypothesis, F-Helms 
statistic is used. 

LADES assess the cost of a longitudinal design using the 
following simple formula: 



cost = N x MCRS + K x COS, 



(4) 



where N is the total number of subjects, K is the total number of 
observations per subjects, MCRS is the marginal cost of inclusion 
(e.g., purchase) of an individual in the study, and COS is the cost 
of evaluating one individual on one occasion. For this problem, we 
use the values 70 and 3.3 Euros for MCRS and COS, respectively. 
Such values are raw estimates (according to research experience) 
of the true cost of experimentation and price of a single European 
eel and are used just to exemplify software capacities. These values 
represent the case when cost of acquiring an individual is too high 
compared to the cost of obtaining an observation. 



Using the Longitudinal Design Evaluate function, we can 
calculate the power the designs provide to test 3, and the cost of 
the design. This feature also allowed us to evaluate intentionally 
incomplete longitudinal designs proposed by Helms. The number 
and label of the steps in time, the parameter estimates shown in 2, 
and the contrasts of interest (shown in 5) are necessary in the 
function. The result of this function is the p-value based on F ' Helms 
statistic, the power of design, and the total cost of the design. The 
full original design had a p = 0.002, so there is a significant 
difference between the means of the slopes of the three aquariums. 
The design showed 88.7% power to test the proposed hypothesis 
and the design had a total cost of 1693.5 Euros. 



A, ft ft ft ft Ps 

0-10100 
0 -1 0 0 0 1 



(5) 



Creating Unequal Longitudinal Designs. Unequal longi- 
tudinal designs (ULD) are based on assigning different individuals 
to groups in order to obtain the distribution that maximizes the 
power of the test under study. For our problem we can define such 
designs as follows: 



Design k = (n A ,n B ,n c ) 



(6) 



where Ha, hb and nc are the number of eels in Aquarium A, B and 
C, respectively. Two constrains are established and described as 
follows: 

1. nA,iiB,nc>2. At least 2 eels per group, according to researchers 
experience; and, 

2. n A +«b + «c< 15. Designs using less than 15 eels (full design) 
are desired. 

The Unequal Longitudinal Designs function in LADES 
generates unequal designs. This function requires the parameter 
estimates, the hypothesis (contrasts), the power, and cost of the 
original design that will be compared to the new designs, as inputs. 
The results generated by this function for the eel study are: ULD 
Design Graph: Total Eels x Power graph where all designs 
requiring less total number of eels and with a power greater than 
0.8 (as it has been shown to be a convention for 'high power', [31, 
page 56], [32]) are depicted (see Figure 2). ULD Designs: presents 
a table where we can see all the characteristics of the constructed 
cost-efficient designs (Table 1). 
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Figure 2. Unequal Cost-Efficient Designs for the European Eel 
Study. Red lines represent the total number of eels (15) and power 
(0.887) of the original longitudinal design. Four designs using fifteen 
eels or less are more powerful than the original design. 
doi:1 0.1 371 /journal.pone.01 00570.g002 

In Figure 2 we can detect an experimental design having more 
power than the original design (red horizontal line) and requiring 
only fourteen eels. 

Results and Discussion 

For ethical, economic and time reasons, it is important to design 
efficient experiments on the use of animals needed to obtain good 
power efficiency using the lowest number of individuals to achieve 
the efficiency targets set by biomedical researchers. In addition, we 
have enough observations to detect significant effects on the study 
process. Consistently, researchers and professionals are encour- 
aged to consult a statistician in the design stage, in addition to 
highlighting the importance of having a clear idea of the true 
model and statistical analysis used to analyze the resulting data. 
These guidelines are provided to assist biomedical researchers and 
professionals to carry out experiments efficiendy. LADES supports 
researchers with this parameter from a statistical point of view in 
the design of animal experiments. LADES provides correct design 
of efficient experiments according to the budget and goals of 
biomedical research and the power efficiency of the statistical tools 
required. To our knowledge, there is no program like LADES that 
allows us to use statistical methods to calculate the required sample 
size, achieving power efficiency in our tests as well as allowing the 
combination of other factors, such as repetition, the number of 
samples required, and mainly, costs. In this research, 23 
simulations were carried out with LADES. We determined that 
researchers can perform simulations, playing with the number of 
replicates, individuals and the costs and other factors, streamlining 
the design of choice for the study. On the other hand, the power 
efficiency can be seen from least to most individuals, allowing the 
researcher to make the decision on the number of samples to work 
on his/her project. Analysis of the results obtained in LADES and 
checked in parallel with the actual experiment, are presented in 
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efficient for new studies involving the same conditions and factors 
under study. At the designing stage of a new study, the results from 
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Future LADES development will focus on including a module 
for sample size estimation under drop-out and functions for 
performing robust analysis to designs. The next version will also 
aim to improve software speed and stability. 

Supporting Information 

Dataset SI European eel data. Aquarium refers to treat- 
ment. CHIP is the identification number for each eel. Weeks 
represents time; number of weeks of the study. Volume is the 
sperm eel volume in ml 100 g fish -1. BOA and B1A are auxiliar 
columns for estimating the effects of the intercept and average 
slope of Aquarium A, respectively, under the parametrization 
proposed in model 1. Same for Aquarium B and C. 
(CSV) 

Author Contributions 

Conceived and designed the experiments: DLGC AVA. Performed the 
experiments: DLGC. Analyzed the data: AVA. Contributed reagents/ 
materials/analysis tools: AVA RMSC. Wrote the paper: ARVA DLGC 
RMSC. Software Author: AVA DLGC. 



4. Laird NM, Ware JH (1982) Random-effeets models for longitudinal data. 
Biometrics 38: 963-974. 

5. Verbeke G, Molcnberghs G (2009) Linear Mixed Models for Longitudinal Data. 
Springer Series in Statistics. Springer. Available: http:/ /books. google. com. mx/ 
books?id = jmPkX4VU7hOC. 



PLOS ONE | www.plosone.org 



5 



July 2014 | Volume 9 | Issue 7 | e100570 



LADES for Constructing and Analyzing Longitudinal Designs 



6. Arnau J, Ballucrka N, Bono R, Gorostiaga A (20 1 0) General linear mixed model 
for analysing longitudinal data in developmental research. Percept Mot Skills 
110: 547-566. 

7. Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear 
models. Biometrika 73: 13-22. 

8. Zcger SL, Liang KY, Albert PS (1988) Models for longitudinal data: A 
generalized estimating equation approach. Biometrics 44: 1049-1060. 

9. Pinhciro J, Bates D (2009) Mixed-Effects Models in S and S-PLUS. Statistics and 
Computing. Springer. Available: http: / / books, go ogle. com. mx/ 
books?id = y54QDUTmvDcC. 

10. Diggle P, Liang K, Zeger S (1994) Analysis of longitudinal data. Gxfbrd 
statistical science series. Clarendon Press. Available: http://books.google.com. 
mx/books?id = 955qAAAAMAAJ. 

1 1. Helms RW (1992) Intentionally incomplete longitudinal designs: I. methodology 
and comparison of some full span designs. Stat Med 11: 1889-1913. 

12. Roy A, Bhaumik DK, Aryal S, Gibbons RD (2007) Sample size determination 
for hierarchical longitudinal designs with differential attrition rates. Biometrics 
63: 699-707. 

13. Huang W, Fitzmauricc GM (2005) Analysis of longitudinal data unbalanced 
over time. Journal of the Royal Statistical Society Series B (Statistical 
Methodology) 67: 135-155. 

14. Reboussin BA, Miller ME, Lohman KK, Have TRT (2002) Latent class models 
for longitudinal studies of the elderly with data missing at random. Journal of the 
Royal Statistical Society Series C (Applied Statistics) 51: 69-90. 

15. de Jong K, Moerbcek M, van der Leeden R (2010) A priori power analysis in 
longitudinal three-level multilevel models: an example with therapist effects. 
Psychother Res 20: 273-284. 

16. Reich NG, Myers JA, Obeng D, Milstone AM, Perl TM (201 2) Empirical power 
and sample size calculations for cluster-randomized and cluster-randomized 
crossover studies. PLoS One 7: c35564. 

17. La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, et al. (2012) 
Hypothesis testing and power calculations for taxonomic-based human 
microbiomc data. PLoS One 7: c52078. 

18. StegmaierJ, Skanda D, Lebiedz D (2013) Robust optimal design of experiments 
for model discrimination using an interactive software tool. PLoS ONE 8: 
e55723. 



19. FIELD A (1998) Statistical software for microcomputers: Mlwin and ncjucry 
advisor. British Journal of Mathematical and Statistical Psychology 51: 367-370. 

20. HintzeJ (2013) PASS 12. NCSS, LLC, Kaysville, Utah, USA. Available: www. 
ncss.com. 

21. Raudenbush ea SW (2011) Optimal Design Software for Multilevel and 
Longitudinal Research (Version 3.01) [Software] . Available: www. 
wtgrantfoundation . org. 

22. Atkinson A, Donev A (1992) Optimum Experimental Designs. Oxford science 
publications. Clarendon Press. Available: http:/ /books. google. com. mx/ 
books?id = cmmOA_-M7S0C. 

23. R Core Team (2013) R: A Language and Environment for Statistical 
Computing. R Foundation for Statistical Computing, Vienna, Austria. 
Available: http:/ /www.R-project.org/. 

24. Pinhciro J, Bates D, DebRoy S, Sarkar D, R Core Team (2013) nlme: Linear 
and Nonlinear Mixed Effects Models. R package version 3.1-113. 

25. Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer New 
York. Available: http://had.co.nz/ggplot2/book. 

26. Wheeler B (2011) AlgDesign: Algorithmic Experimental Design. Available: 
http://CRAN.R-project. org/package -AlgDesign. R package version 1.1-7. 

27. Gromping U (2014) R package FrF2 for creating and analyzing fractional 
factorial 2-level designs. Journal of Statistical Software 56: 1—56. 

28. Hojsgaard S, Halekoh U, Yan J (2006) The r package geepack for generalized 
estimating equations. Journal of Statistical Software 15/2: 1-11. 

29. Asturiano JF, Marco-Jimenez F, Perez L, Balasch S, Garzon DL, et al. (2006) 
Effects of heg as spermiation inducer on european eel semen quality. 
Theriogenology 66: 1012-1020. 

30. Zuur A, Ieno E, Walker N, Saveliev A, Smith G (2009) Mixed Effects Models 
and Extensions in Ecology with R. Statistics for biology and health. Springer. 
Available: http:/ /books. google. com. mx/books?id = vQUNprFZKHsC. 

31. Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences. L. 
Erlbaum Associates. Available: http://books.google.com.mx/ 
books?id - T10N21RAO9oC. 

32. Peterman RM (1990) Statistical power analysis can improve fisheries research 
and management. Canadian Journal of Fisheries and Aquatic Sciences 47: 2-15. 



PLOS ONE | www.plosone.org 



6 



July 2014 | Volume 9 | Issue 7 | e100570 



